NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging
نویسندگان
چکیده
Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. Determining the source of the performance problems requires detailed end-toend instrumentation of all components, including the applications, operating systems, hosts, and networks. In this paper we describe a methodology that enables the real-time diagnosis of performance problem in complex higbperformance distributed systems. The methodology includes tools for generating timestamped event logs that can be used to provide detailed end&end application and system level monitoring; and tools for visualizing the log data and real-time state of the distributed system. This methodology, called NetLogger, has proven invaluable for diagnosing problems in networks and in dismbuted systems code. This approach is novel in that it combines network, host, and application-level monitoring, providing a complete view of the entire system. NetLogger is designed to be extremely lightweight, and includes a mechanism for reliably collecting monitoring events from multiple distributed locations. distributed systems performance analysis and debugging
منابع مشابه
NetLogger: A Toolkit for Distributed System Performance Analysis
Diagnosis and debugging of performance problems on complex distributed systems requires endto-end performance information at both the application and system level. We describe a methodology, called NetLogger, that enables real-time diagnosis of performance problems in such systems. The methodology includes tools for generating precision event logs, an interface to a system eventmonitoring frame...
متن کاملScalable Analysis of Distributed Workflow Traces
Large-scale workflows are becoming increasingly important in both the scientific research and business domains. Science and commerce have both experienced an explosion in the sheer amount of data that must be analyzed. An important tool for analyzing these huge data sets is a compute “cluster” of hundreds or thousands of machines. However, debugging and tuning clusters requires specialized tool...
متن کاملOn-Demand Grid Application Tuning and Debugging with the NetLogger Activation Service
A typical Grid computing scenarios involves many distributed hardware and software components. The more components that are involved, the more likely one of them may fail. In order for Grid computing to succeed, there must be a simple mechanism to determine which component failed and why. Instrumentation of all Grid applications and middleware is an important component in the solution to this p...
متن کاملUsing NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System
Developers and users of high-performance distributed systems often observe performance problems, the reasons for which are rarely obvious. Bottlenecks can occur in any of the components along the paths through which the data flows: the applications, the operating systems, the hosts, or the network. We have developed a methodology, known as NetLogger, for detailed, end-to-end, top-to-bottom moni...
متن کاملScalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye
Monitoring and information system (MIS) implementations provide data about available resources and services within a distributed system, or Grid. A comprehensive performance evaluation of an MIS can aid in detecting potential bottlenecks, advise in deployment, and help improve future system development. In this paper, we analyze and compare the performance of three implementations in a quantita...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003